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Abstract 

A sender wishes to broadcast an n character word x E F" (for a field F) to n receivers 
Rl, . . . , Rn- Every receiver has some side information on x consisting of a subset of the char- 
acters of X. The side information of the receivers is represented by a graph G on n vertices in 
which {i,j} is an edge if and only if Rj knows Xj. In the index coding problem the goal is to 
encode x using a minimum number of characters in F in a way that enables every Rj to retrieve 
^ I the fth character x, using the encoded message and the side information. An index code is lin- 

ear if the encoding is linear, and in this case the minimum possible length is known to be equal 
Cn , to a graph parameter called minrank (Bar-Yossef et al., FOCS'06). Several bounds on the mini- 

mum length of an index code for side information graphs G were shown in the study of index 
coding by Bar-Yossef et al. (FOCS'06), Lubetzky and Stav (FOCS'07), Alon et al. (FOCS'08), and 
Blasiak et al. (manuscript, arXiv'lO). However, the minimum length of an index code for the 
fj ' random graph G(n, p) is far from being understood. 

In this paper we initiate the study of the typical minimum length of a linear index code for 
G{n,p) over a field F. First, we prove that for every constant size field F and a constant p, the 
^ ■ minimum length of a linear index code for G(n, p) over F, i.e., the minrank of G{n, p) over F, 

^D , is almost surely n{^yn). Second, we introduce and study the following two restricted models of 

index coding: 

^^ I 1. A locally decodable index code is an index code in which the receivers are allowed to query 

t*^ ' at most q characters from the encoded message. We prove that the minimum length of 

^— ^ . a linear locally decodable index code for G(n, p) over F with q queries is almost surely 

n(^) assuming that q = Q(«3). In particular, for locally decodable index codes with 

some q = o{^/n) we get anco{y/n) lower bound. 

k> , 2. A low density index code is a linear index code in which every character of the word x 

;_( ' affects at most q characters in the encoded message. Equivalently, it is a linear code whose 

generator matrix has at most q nonzero entries in each row. We prove that in order to 
show an co{^yn) lower bound on the minimum length of a linear index code for G{n,p) 
over F it suffices to show such a lower bound on the length of a lozv density index code 
for G{n, p) over F with some q = co{l). In addition, we prove that the minimum length of 
a low density index code for G(n, p) over F is almost surely at least n^^'^ for q = 2 and at 

2 _ 

least n^~'^ for q — 3 for any sufficiently small constants p and £ > 0. 
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1 Introduction 

In the index coding problem, a sender wishes to broadcast an n character word x G F" (for a finite 
field F) to n receivers Ri, . . . , R,, in a way that enables every R, to retrieve the iih character x,. 
Every receiver has some side information on x. The side information is represented by a directed 
graph G on the vertex set [n] = {1, 2, . . . , n} in which a vertex / is connected to a vertex/ if and only 
if the receiver R, knows Xj. Given a side information graph G, the goal is to find a coding scheme 
of minimum length, by which every receiver R, is able to retrieve Xi given the encoded message 
and the side information that it has on x according to G. The settings are naturally extended to 
undirected graphs in which an edge {i,]} meas that R, knows Xj and Rj knows x,. 

For example, assume that every receiver R, knows Xj for every) G [n] \ {i}. The corresponding 
side information graph is the complete graph on the vertex set [n] . In this case, broadcasting the 
sum ^;g r„| Xj over F enables every receiver R, to retrieve X;, and hence the minimum message 
length required here is 1. 

The study of index coding w^as initiated by Birk and Kol in [6| and further developed by Bar- 
Yossef, Birk, Jayram and Kol in ||5l. This research is motivated by applications, such as video 
on demand and wireless networking, in which a network transmits information to clients, and 
during the transmission every client misses some of the information. At this step, the clients have 
side information on the transmitted information, and the netw^ork is interested in minimizing the 
broadcast length in a way that enables the clients to decode their target (see, e.g., ||23|| ). 

Research on index coding is motivated by several questions in theoretical computer science. 
For example, index coding is a natural version of the one-way communication complexity problem of 
the indexing function studied in ||T4| . In this problem, Alice is given an n bit string x, sends a single 
message to Bob, and Bob, given an index z, should be able (possibly probabilistically) to discover 
Xj. The goal is to minimize the length of Alice's message. The index coding problem over F2 is 
equivalent to this question once we restrict Bob to act deterministically and allow him to use some 
side information on x, depending on i. The study of index coding is also motivated by the more 
general problem of network coding, introduced by Ahlswede et al. [H. El Rouayheb et al. showed 
in IITOll that network coding instances can be efficiently reduced to index coding instances. Hence, 
understanding index coding capacities is motivated by applications in computational complexity 
regarding deciding and approximating the network coding problem (see IIT6llT5| ). 

For a graph G and a field F we denote by /3i (G) the minimum length of an index code for G 
over F. This graph parameter is well-know^n to be related to several classical graph parameters. 
Indeed, for an undirected graph G, f>i{G) is bounded from below by a(G), the maximum size of 
an independent set in G, as follows from the fact that an independent set in G corresponds to a 
set of receivers with no mutual information. On the other hand, f>i{G) is bounded from above by 
X{G), the clique cover number of G, as follows from broadcasting the sum over F of the characters 
corresponding to the vertices in every clique in an optimal clique cover. 

In [51, Bar-Yossef et al. identified an algebraic graph parameter, called minranl^ and denoted 
by minrkp (G), which upper bounds j6i (G) . Interestingly, they proved that the minrank of a graph 
G over F equals the minimum length of a linear index code for G over F (i.e., an index code whose 
encoding function is linear). They also proved that their upper bound is tight and is equal to 
/3i(G) for several graph families for the binary field F2. This includes directed acyclic graphs, 
perfect graphs, odd holes (undirected odd-length cycles of length at least 5) and odd anti-holes 
(complements of odd holes). These results raised the question whether the minrank parameter 
characterizes the minimum length of general index codes. This question was answered in the 



^See DefinitionO 



negative by Lubetzky and Stav IITSlI , who showed that for any e > and a sufficiently large n 
there is an n vertex graph G with j6i(G) < n*^ and rmr\Ykj:^{G) > n^^^ (see 131 for additional 
counterexamples). We note that the proof in ITSl uses a property of the minrank (see also 1T21 ), 
saying that for every field F and an n vertex undirected graph G, 

minrk]F(G) ■minrk]F(G) > n. (1) 

The first to define the minrank parameter was Haemers ITTl [121 , who related it to what is 
known as the Shannon capacity of graphs introduced in l22| . Haemers showed that for every field 
F and an undirected graph G, ix[G) < c{G) < minrk]F(G), w^here c{G) stands for the Shannon 
capacity of G. He also showed that there are graphs for which the minrank upper bound on the 
Shannon capacity is tighter than the one given by the w^ell-known Lovasz 0-function introduced 
in l|T7| . We note that calculating the minrank of a given input graph is known to be NP-hard 1211 , 
as opposed to the efficiently computable Lovasz 0-function. 

The following theorem summarizes some of the bounds mentioned above. 

Theorem 1.1 ( lTn[T2l l5l). For every field F and an undirected graph G, 

a(G) < j6i(G) < minrkF(G) < x{G). 

All the inequalities in the above statement are known to be strict for certain graphs. This makes 
the task of understanding ^i{G) challenging. A fundamental parameter to study in this context 
is the typical value of /3i (G) for random graphs G. This question was raised by Lubetzky and Stav 
in ITSl for the well-known random graph G{n, j), where G{n, p) denotes the random undirected 
graph with n vertices and edge probability p. In this paper we focus on linear index codes and 
study the following question: 

What is the typical minimum length of a linear index code for the random graph G{n, p) over F? 

Equivalently, we are asking for the typical minrank over F of the random graph G{n, p). 

Let us start with some bounds yielded by Theorem 11.11 Both the independence number and 
the clique cover number of G{n, p) are well understood (see 13 for the former and 13 [HI for the 
latter). For a constant edge probability p, we obtain that almost surely (i.e., w^ith probability that 
tends to 1 as n tends to infinity), 

(1 ±0(1)) ■ -^^ < minrkF(G(n,p)) < (1 ±o(l)) ■ " °^ '^ 



log^- "^^ ""-' ' " 21og((l-p)n)- 

Inshort, for a constant p, almost surely, Q (log n) < rmnrk^ {G{n,p)) < 0{'^^^). The gap between 
these low^er and upper bounds is exponential, and, surprisingly, no better bounds are known to 
hold almost surely for G(n, p). Yet, it is plausible to expect the minrank of G(n, p) to be much 
higher than the n(logn) lower bound, since the bound in ^ implies that the expected minrank 
of G{n,p) is 0.{^/n) for p = \ (and hence for any p < ^ as well). To see this, notice that if 
G is distributed according to G{n,^) then so is its complement, and hence the probability that 
minrk]F(G) > ^/n is at least \. We note, though, that any co{^/n) lower bound on the expectation 
above would imply an co{^/n) lower bound which holds almost surely, as follows from the large 
deviation inequality for vertex exposure martingale (see, e.g., EJ, Chapter 7). Understanding 
the true value of minrk]p(G(n, p)) and, more specifically, the question whether one can show an 
00 ( ^/n) lower bound on it, are the driving force of this work. 



1.1 Our Contribution 

In the current paper we study the typical minimum length of a linear index code for the ran- 
dom graph G{n, p) over a field F. We start by showing that an 0(1/^) lower bound holds with 
probability that (exponentially) tends to 1 as n tends to infinity (and not only in expectation). In 
addition, the bound holds for every constant size field F and a constant edge probability po 

Theorem 1.2. For every constant size field F and a constant p G (0, 1), almost surely 

minrk]F(G(M, p)) = Ci{yGi). 

Observe that Theorem 11.21 implies that the random graph G{n,^) almost surely has an ex- 
ponential gap between its independence number and its minrank over any constant size field. 
In 111, Alon conjectured that the Shannon capacity of G{n, ^) satisfies c{G{n, ^)) = O(logn) al- 
most surely. This, if true, would imply an exponential gap between the Shannon capacity and the 
minrank upper bound of Haemers |12J on it for a typical graph G(n, 5). 

In the attempt to understand where the minrank of G{n, p) exactly lies in the range from ^/n 
to j^l^ we introduce and study two natural restricted models of index coding. 

Locally decodable index coding. In our first model we study index codes in which the decoders 
are allowed to query a limited number of characters from the encoded message. More precisely, 
these are index codes in which the sender maps x G F" to an encoded message, and each of the 
receivers should be able to recover x, using at most q queries to the encoded message and the 
information that the receiver has on x according to the side information graph. The following 
theorem says that every linear locally decodable index code for G{n,p) over F with q significantly 
smaller than \/n almost surely has length much higher than ^/n. The Q notation is used to hide 
factors which are logarithmic in n. 

Theorem 1.3. For every constant size field F and a constant p G (0, 1), if there exists a linear index code 
of length i for G{n,p) over F, such that every decoding function queries at most q = n(n3) characters 
from the encoded message, then almost surely £ = Q ( | ) . 

We note that a locally decodable index code is a natural analogue of the widely studied object 
known as locally decodable codes introduced by Katz and Trevisan [131. Roughly speaking, locally 
decodable codes enable a probabilistic decoding of any character of the original message by look- 
ing at a limited number of characters in a possibly corrupted encoded message. 

Low^ density index coding. The second model we study consists of linear index codes in which 
every character of the w^ord x (that the sender w^ishes to broadcast) affects a limited number, say 
q, of characters in the encoded message. Such codes are generated by generator matrices in which 
every row has at most q nonzero entries, thus we call them low density generator matrix index codes 
(or, in short, low density index codes). 

Low density codes are usually not so useful in coding theory. The reason is that such codes 
have minimum distance at most q, whereas, in most applications, one desires codes of large mini- 
mum distance. However, for our purposes such codes turn to play a major role. More specifically, 
our next theorem says that improving the ^/n lower bound on the length of low density index 
codes for G{n, p) -will imply such an improvement on the length of linear index codes for G{n, p) 



^In fact, our proof provides a lower bound also for the case that | F | and p depend on n . For the full statement of this 
theorem see Theorem l4.3l 



in general. This is quite surprising since low density index codes intuitively seem significantly 
weaker than general linear index codes. We state this result here informally, and the formal state- 
ment can be found in Section |6l 

Theorem 1.4 (informal). Assume that every linear index code for G{n, p) over F, with at most q = cv{l) 
nonzero entries in a row of its generator matrix, has length co{^/n) with high probability. Then, almost 
surely, minrk]F(G(n, p)) = co{^/n). 

Theorem 11.41 motivates studying lower bounds on the length of low density index codes for 
G{n, p). Observe that the minimum length of a low density index code with cj = 1 for a graph G 
equals the clique cover number ;^(G). This implies a tight lower bound of ^{j^J for q = 1. We 
are also able to prove cv ( y/n) lower bounds for low density index codes for q = 2 and q = 3, as 
stated below. 

Theorem 1.5. For every constant size field F and sufficiently small constants e,p > 0, 

1. every linear index code for G{n, p) over F, in which every character of the sent word affects at most 

2 characters of the encoded message, almost surely has length at least n^^^, and 

2. every linear index code for G{n, p) over F, in which every character of the sent word affects at most 

2 

3 characters of the encoded message, almost surely has length at least m^^. 



1.2 Outline 

The remainder of the paper is organized as follows. In Section |2] we provide some background 
preliminaries needed throughout the paper. In Section |3] we show that the minimum length of an 
index code for the (undirected) graph G{n, p) is similar to that of directed random graphs. This 
enables us to simplify the presentation of our proofs by considering the directed random graph 
model. In Section|4]we prove the Ci[^/n) lower bound given in Theorem ll.2[ In Section|5]we prove 
our result on locally decodable index codes, and in Section |6] we prove our results on low density 
index codes. The final Section [7] discusses some concluding remarks and open questions. 

2 Preliminaries 

In the index coding problem a sender wishes to broadcast a w^ord x G F" (for a field F) to n 
receivers Ri, . . . ,Rn- Every receiver R, knows some fixed subset of the characters of x and is 
interested solely in the character x,. An i-index code for this setting is a length i code over F, which 
enables Rj to recover Xj for every x G F" and i G [ft]. 

The index coding problem can be stated as a graph parameter. For a directed graph G and a 
vertex v let Nq{v) denote the set of out-neighbors of y in G, and for x G F" and S C [n] let x\s 
denote the restriction of x to the coordinates of S. The setting of the definition of an index code is 
characterized by the directed side information graph G on the vertex set [n] where (z,/) is an edge if 
and only if the receiver R, knows Xj. An £-index code for G over F is a function E : F" — )■ F^ and 
functions Di, . . . , D„, so that for all i G [n] and x G F", Dj{E{x),x\j^+,i-^) = x,. The definition of an 



index code is naturally extended to undirected graphs by replacing every undirected edge by two 
oppositely directed edges. 

We say that the index code is linear if the encoding function E is linear. It is not difficult to see 
that in the linear case it can be assumed, without loss of generality, that the linear function E is 
homogenous. This means that there exist vectors ei, . . . , e^ G F" such that every x G F" is mapped 



by E to E(x) = {{ei,x),{e2,x),. . . ,{e(^,x)). For example, for the binary field F2/ every coordinate 
of £(x) is the xor of a certain subset of the coordinates of x. 

Bar-Yossef et al. 151 showed that the minimum length of a linear index code for G over F equals 
minrk]p(G), a graph parameter defined as follows. 

Definition 2.1. Let A = (fly) he an n by n matrix over some field F. We say that A represents an n 
vertex graph G over F if an 7^ 0/or all i, and aij = whenever i ^ j and [i,]) is not an edge in G. The 
minrank of a graph G over F is defined as 

minrk]p(G) = min{rankF(A) | A represents G over F}. 

Let E : F" — > F^ be a linear l-index code for a graph G and identify it with its generator matrix 
in F"^''. Denote the fth column of E by e, and denote span(E) = span(ei, . . . , q). For a message 
x E F", the fth receiver is interested in x,. In order to discover x, the zth receiver is allowed to 
use the codeword E(x) = ((ei, x), {e2, x), . . . , (e^, x) ) and the side information that it has on x 
according to G. It can be seen that the zth receiver is able to discover x, if and only if there exists 
a vector in span(E) that is nonzero in the ith entry and is zero in all the entries that correspond to 
non-neighbors of /. This motivates the following definition which will be useful throughout the 
paper. 

Definition 2.2. For a graph G on the vertex set [n], a vector v G F" satisfies a vertex i G [n] ifvi 7^ 
and Vj = Ofor every j G [n] \ {i} such that i is not connected to j in G. 

Using this terminology, E is a linear index code for G if and only if every vertex i G [n] is satisfied 
by a vector in span(E). 

We need the following simple claim, in which we use B„{r) to denote the set of vectors in F" 
of Hamming weight (i.e., number of nonzero entries) at most r. 

Claim 2.3. For every field F, n,£,r G N, and a basis E G F"^^, the number of indices of coordinates that 
are nonzero in at least one vector in span(E) n Bn{r) is at most r ■ i. 

Proof: Consider the following process: start with i = 1, and at the zth step choose a vector u, G 
span(E) n B„(r) that has a nonzero coordinate which is zero in all the previously chosen vectors 
v\,..., u,_i. Clearly, the process does not terminate as long as there is a coordinate that is nonzero 
in at least one vector in span(E) n Bn(r) but is zero in all the chosen u,'s. Observe that for every 
i, the vectors v\,. . . ,Vi are linearly independent. Therefore, the process terminates after at most 
i steps. At each step we have at most r new indices of nonzero coordinates since the y,'s are in 
B,i (r) . This implies that the number of indices of coordinates that are nonzero in at least one vector 
inspan(E) n B„(r) is at mostr • £. ■ 

Let G{n, p) denote the random undirected graph with n vertices and edge probability p, and 
let G{n, p) denote the random directed graph with n vertices and edge probability p. We say that 
G{n, p), resp. G{n, p), satisfies a graph property almost surely if the probability that G{n, p), resp. 
G{n, p), satisfies this property tends to 1 as n tends to infinity. 

Throughout the paper we ignore floors and ceilings whenever appropriate as this does not 
affect the asymptotic nature of our results. 

3 G{n, p) versus G{n, p) 

In this section we prove a lemma saying that the minimum length of an index code for G{n,p) and 
the minimum length of an index code for G{n, p) behave similarly for constant edge probabilities. 



We start with the intuitive proof idea and then turn to the proof. In the directed graph G{n, p) the 
probability that two vertices are connected by two oppositely directed edges is p^. Hence, G{n, p) 
essentially contains a copy of G{n,p^). On the other hand, the probability that two vertices in 
G{n, p) are not connected at all is (1 — p)^, so G{n, p) is essentially contained in G(n, 1 — (1 — pY). 
Therefore, a lower bound on the minimum length of an index code for G{n,p) for some constant 
p implies a lower bound on that of G(n, p') for some constant p' and vice versa. 

Lemma 3.1. For every field F, n,£ G N and p G (0, 1), let 

Pi = Pr [There exists an i-index code for G{n, p^) over F] , 



P2 = Pr 



There exists an l-index code for G{n, p) over F 



P3 = Pr [There exists an i-index code for G{n, p{2 — p)) over F]. 

Then, Pi < P2 < P3. 

In addition, the inequalities hold when we require the index codes in the three events to be linear, {q,i)- 

locally decodablefor some q &K,or (q, i)-low density for some q G NO 

Proof: We first show that Pi < P2. For a directed graph G, let G denote the undirected graph 
on the vertex set of G in w^hich two vertices are adjacent if and only if they are connected in G 
by two oppositely directed edges. Observe that if G is distributed according to G{n, p) then G is 
distributed according to G{n, p^). For a graph G (directed or not) and £ G N w^e denote by Iq^i the 
indicator variable of the event "There exists an ^-index code for G over F". Notice that for every 
directed graph G we have Iq^i > /g ^, since every £-index code for G is an £-index code for G as 
well. We get that 

P2= E ^G,e ■ Pr [G] > E k,e ' ^^ [G] = E ^G',e ■ Pr [G'] = Pi, 

GGG{n,p) GGG(n,p) G'GG{n,p2) 

w^here the second equality holds since the probability of a graph G' according to the distribution 
G{n, p^) equals the sum of the probabilities of all the graphs G satisfying G = G' according to the 
distribution G{n, p). 

The proof of the inequality P2 < P3 is similar. For a directed graph G, let G denote the undi- 
rected graph on the vertex set of G in which two vertices are adjacent if and only if they are 
connected in G by at least one directed edge. Observe that if G is distributed according to G{n, p) 
then G is distributed according to G(n, p(2 — p)) and that for every directed graph G we have 
^GA <^Gt We get that 

^2 = E ^G,i ■ Pr [G] < E k,i ■ Pr [G] = E ^G',£ ■ Pr [C] = P3, 

G<^G(n,p) GGG{n,p) G'eG(n,p(2-p)) 

where the second equality holds since the probability of a graph G' according to the distribution 
G{n, p{2 — p)) equals the sum of the probabilities of all the graphs G satisfying G = G' according 
to the distribution G{n, p). 

Finally, assume that the codes in the three events satisfy one (or more) of the properties men- 
tioned in the statement of the lemma. It can be seen that an almost identical proof yields the 
result, since the inequalities Iq( < Ig,£ ~ ke remain true when we require the code to have such 
a property in the definition of the event Iq^. m 



^See Definitionsl5.1landl6.1 



4 The Cl{^/n) Lower Bound 

In this section we prove that ininrk]F( G(n,p)) > Q(i/n) almost surely. By Lemma 13 .11 it suffices to 
prove the lower bound for the directed random graph G{n, p). We start with some intuition. Fix 
a linear ^-index code generated by £ G F"^'' for certain £ = 0{^/n). Our goal is to show that the 
probability that E is an index code for G{n, p) is exponentially small, so that applying the union 
bound over all the codes E will give us the result. As mentioned before, if E is an index code for a 
graph on the vertex set [n] then every vertex i is satisfied by a vector in span(E), i.e., there exists 
a vector v G span(E) such that y, 7^ and Vj = for all j 7^ / for w^hich (/,/') is not an edge in 
G. It is not hard to verify that any vector in span(£) of Hamming weight r, whose ith entry is 
nonzero, satisfies a vertex / with probability p''^^. Using this, we show that the probability that 
at least | vertices are satisfied by vectors of high Hamming weight is small (Lemma 14. 1|) . On the 
other hand, we use Claim 12.31 to show^ that at most j vertices can be satisfied by vectors of low 
Hamming weight (Lemma I4.2|) . This implies that with high probability there exists a vertex in the 
graph which is not satisfied by any vector in span(£), and hence with such probability, £ is not an 
index code for the graph. 

The following lemma bounds from above the probability that the graph G{n, p) has an index 
code for w^hich many vertices are satisfied by vectors of high Hamming weight. 

Lemma 4.1. For every field F and n,r,s G N, the probability that there exist a linear i-index code E G 
F"^^ for G{n, p) over F and s vertices, each of which is satisfied by a vector in span(£) \ B„{r), is at most 

"yiFr^^(iF|^p'-)'. 

Proof: Fix a linear £-index code £ for G{n, p) over F and a set S C [n] of s vertices. The probability 
that a vertex i is satisfied by a fixed vector y G span(£) \ Bn{r) is at most p''. To see this, notice 
that every vertex (except /) which corresponds to a nonzero entry of y must be a neighbor of i, and 
this happens independently with probability p. Taking the union bound over all the vectors in 
span(£) \ Bn{r), we get that the probability that a vertex is satisfied by a vector in span(£) \ B„(r) 
is at most |F|^ • p^ . Hence, by the independence of the edges in G{n, p), the probability that every 

vertex in S is satisfied by a vector in span(£) \ Bn{r) is at most f |F|^ ' V'^) ■ Now, apply the union 
bound over all the matrices £ and sets S to get the desired bound. H 

Now we turn to deal with vertices which are satisfied by vectors of low Hamming weight and 
to bound from above their number. 

Lemma 4.2. For every field F, a graph G, and a linear i-index code for G over F, at most | vertices in G 
are satisfied by vectors of Hamming weight at most jg. 

Proof: Let £ G F"^^ be a generator matrix of a linear ^-index code for G over F. By Claim l2!3l the 
number of indices of coordinates that are nonzero in at least one vector in span(£) n ^^(jj) is at 
most I . Recall that a vector which satisfies a vertex / must have the ith entry nonzero. Hence, the 
number of vertices that can be satisfied by vectors in span(£) of Hamming weight at most ^7 is at 
most J. ■ 

The f),{^/n) lower bound follows from combining Lemmas 14. II and l42l 
Theorem 4.3. For every field F and p G (0, 1), almost surely 



I / log - 

minrkF(G(n,p)) = Q(^Vn. W— -^ 



Proof: Takei < \fn- \ g^^ Li - By Lemma |3.1l it suffices to prove that minrkFf Gf n, p)) > ^almost 

surely. Let G be a graph distributed according to G{n, p). Let A denote the event that there exist 
a linear £-index code E for G over F and ^ vertices in G, each of which is satisfied by a vector in 

span(E) \ Bn{^). By LemmaHH 

On the other hand, by Lemma |4!2l there is no linear £-index code E for G over F and ^ vertices in 
G, each of which is satisfiedby a vector inspan(E) riBniji). This implies that almost surely there 
is no linear £-index code for G over F. H 

5 Locally Decodable Index Codes 

In this section we study locally decodable index codes defined as follows. 

Definition 5.1. A (^,£)-locally decodable index code zs an i-index code in which the query complexity 
of the decoding is at most q. This means that for every i the decoding function D, of the ith receiver queries 
at most q characters from the encoded message. 

Remark 5.2. for every graph G, the minimum I for which there is a (1, i)-locally decodable index code for 
G over F is the clique cover number x(G) ofG. 

The following theorem shows a lower bound on the length of a linear locally decodable index 
code for G{n, p) over F. Although more involved, its proof follows the nature of the proof given 
for Theorem l4.3[ 

Theorem 5.3. For every constant size field F and a constant p G (0, 1), there exist constants Ci,C2 > 

2/3 

such that if£ < Ci ■ , " ,^^3 and q < C2- ^j" ^ then almost surely there is no linear {q, t)-locally decodable 
index code for G{n, p) over F. 

Proof: Let I and (^ be as in the theorem and define r = — " °7 ' . By Lemma l3Jl it suffices to show 

that almost surely there is no linear ((j, £)-locally decodable index code for G{n,p) over F. For a 
graph G distributed according to G{n, p) consider the following two events: 

• Ai: there exist a linear £-index code E for G and | vertices in G, each of which is satisfied by 
a vector in span(E) \ B„{r). 

• A^. there exist a subspace W C F" and a set S of | vertices in G such that 

1. W is spanned by i vectors of Hamming weight in (^j, r], and 

2. there exists a set fi C W of £ vectors such that every vertex in S is satisfied by a vec- 
tor which is a linear combination of at most q vectors in \1 and has Hamming weight 
greater than ^^ . 

The following lemma reduces the lower bound in the theorem to analyzing the probabilities of 
A\ and A^. 

Lemma 5.4. If there is a linear {q,i)-locally decodable index code E for G over F then at least one of Ai 
and A-i occurs. 
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Proof: Let ei,...,e[ G F" be the vectors for which every x G F" is mapped by E to £(x) = 
((ei,x), {e2,x),. . . , {e(,x)). By Lemma |42l there are at most | vertices in G that are satisfied by 
vectors in span(£) of Hamming weight at most j-f. Assume that Ai does not occur. This impUes 
that there exists a set S of ^ vertices in G which are satisfied by vectors in span(E) with Hamming 
w^eight in (^j, r] (and are not satisfied by any other vectors in span(£)). Let y^ ■ ■ ■ ,y'n be vectors 
that satisfy the vertices in S, and let W be their linear span. Notice that W is spanned by £ of these 
vectors, thus Item[T]of event A2 holds. Since W C span(E), the subspace span(E) equals the direct 
sum W (B V for some subspace V Q F". Thus, for every / G [i], the vector e, can be uniquely 
written as ej = Wj + Vj for some iVj G W and Vj G V. 

We claim that the set U = {zvi, . . . ,iV{} satisfies the requirement in Item|2]of event A2. Indeed, 
for every vertex z G S there exist a set / C [£] of size at most q and coefficients Uj G F for / G / 
such that z is satisfied by the vector w^ = ^.^j Uj ■ ej ^ W which has Hamming weight in (57, /"]. 
Notice that Wz = Eygj fly • Wj + J^j^j Uj ■ Vj, where J^j^j Uj ■ Wj E W and J^j^j Uj ■ Vj G V. Since a 
vector in span(E) can be uniquely written as a sum of a vector in W and a vector in V, we must 
have Wz = YLj^^i ^j ' ^j- This yields that the vertex z is satisfied by the vector iVz w^hich is a linear 
combination of at most q vectors in U and has Hamming weight greater than ^j- We conclude that 
A2 occurs, and we are done. ■ 

We turn to prove that each of Ai and A2 occurs with probability exponentially small in n, and 
this implies, by the union bound, that there is no linear (q, £)-index code for G almost surely. 
By Lemma 14^1 and the definition of r, 

Pr [A] < (") ■ |F|"' • (|F|^ ■ f) ' < 2" ■ \W\"' ■ (|F|^ • |F|-io^) ' = |F|-"("^). 

To bound from above the probability of A2, we use the union bound over all the subspaces W, 
sets S and sets U. The number of subspaces W is at most the number of spanning sets, which is 
bounded by ((") • |F|'')^, by ItemUof event A2. The number of sets S is {V), and the number of 

sets fj C W of size i. is at most |F|^ . The probability that a vertex in S is satisfied by a vector of 
Hamming weight greater than j^ is at most pz?^ and we take the union bound over all the vectors 
that can be written as a linear combination of at most q vectors in U, whose number is at most 
(V |F|^. Recall that r = ©(£) and observe that 

pri^d < ((:)-irr)'-(:)i«=r(Qm'-p*)' 

2 2 



< 20{flogn+n+e+nq\ogi)-0.(!i^)^2 



-Qi 



-fl 



where the last equality follows from our assumptions on £ and q for an appropriate choice for Ci 
and C2. ■ 

6 Low Density Generator Matrix Index Codes 

In this section we study low density generator matrix index codes (or, in short, low density index 
codes). As w^ill be presented in detail shortly, to obtain our lower bounds (in Section |6^ we use 
proof techniques that differ significantly from those previously presented. A formal definition of 
low^ density index codes follows. 

Definition 6.1. A [q,tj-\ov<r density index code is a linear l-index code in which every character of the 
sent word affects at most q characters in the encoded message. Equivalently, it is a linear i-index code whose 
generator matrix has at most q nonzero entries in a row. 
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Remark 6.2. For every graph G, the minimum I for which there is a {l,i)-low density index code for G 
over F is the clique cover number x{G) ofG. 

6.1 The Reduction io q = <x>{l) 

The following theorem shows that in order to prove an (JJ[^/n) lower bound on the minimum 
length of a linear index code for G{n, p) over a field F, it is enough to prove such a lower bound 
on the length of a low density index code for G{n, p) over F for some q = a;(l). For simplicity, we 
state and prove the result for the directed graph G{n,p), but using Lemma |3H one can obtain a 
similar result for the undirected random graph G{7i, p). 

Theorem 6.3. For every field F and p G (0,1), if the probability that G{n,p) has a {q,i)-low density 
index code over F is 2^^'^''^^ for some q = a;(l) and i = co{^/n), then the minimum length of a linear 
index code for G{n, p) over F is almost surely co{^/n). 

Proof: Fix F and p G (0, 1). We start by rephrasing the assumption in the theorem statement to 
allow a more structured proof. Namely, it follows from the theorem's assumption that there exists 
a non-decreasing function g : N — ?► (0, oo) satisfying g{n) = co{l) such that the probability that 
G{n, p) has a {g{n)^, ^Jn ■ g{n))-low density index code over F is at most 2^^". We will prove that 
almost surely there is no linear £-index code for G{n, p) over F for £ = i/n • f{n), where / : IN — ?> 

(0, oo) is the function defined by /(n) = min^, \ j^j^^^m) " g{ Til )• Notice that i = co{y^) and 
hence the theorem will follow. 

Let G be a graph distributed according to G{n, p), and consider the following tw^o events: 

• Ai: there exists a set of | vertices in G whose induced subgraph has a ( ^ °^ ,i)-low 
density index code. 

• A2: there exist a linear £-index code E for G and | vertices in G, each of which is satisfied by 
a vector inspan(E) \ B„ ( . °^j ) . 

First, we claim that every graph G that has a linear £-index code must satisfy at least one of 
the events Ai and A2. To see w^hy, consider a graph G that has a linear ^-index code E G F"^^ and 
does not satisfy A2. Observe that G has a set S of | vertices that are satisfied by vectors in span(E) 
of Hamming weight at most °^l . Take a maximal linearly independent subset of the | vectors 

which satisfy the vertices in S and restrict them to the coordinates that correspond to vertices in S. 
The matrix with these restricted vectors as columns has at most £ columns and consists of at most 
£ ■ , °^i = "'-^ , '1°^ ' nonzero entries. This implies that this matrix has at least '4 rows each of 

which has at most ^,' '?^' ' nonzero entries. Restricting the matrix to these 4 rows, we get that 

logi o 40 

Ai holds. 

Now, we turn to bound from above the probabilities of the events Ai and A2. Clearly, every 
induced subgraph of G{n,p) on ^ vertices is distributed according to G(|,p). Therefore, the 
probability that it has a {g{ \j] )^, ^ ■ g{ \j] ))-low density index code over F is at most 2^^i = 
2^'^". Using the definition of / and the union bound taken over the subsets of [n] of size ^, we 
obtain 

Pr[Ai]< Q-2-2"<2-". 
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By Lemma l^n 

/ \ 4Mog|F| 



By the union bound the probability that at least one of the events occurs is smaller than 2 ^("), 
thus, with such probability, G{n,p) has a linear £-index code. ■ 

6.2 The Lower Bounds for q G {2,3} 

The following theorem says that every index code for G{n, p) whose generator matrix has at most 
3 nonzero entries in a row has length a; (i/n). 

Theorem 6.4. For every field F and a sufficiently small e > there exists a p' = p'(|F|,£) > such that 
for any p G (0, p') the following holds almost surely. 

1. If there is a (2, £)-low density index code for G{n, p) over F then i > n^^^. 

2 

2. If there is a (3, t)-low density index code for G{n, p) over F then i > ns^^. 

The approach in the proof of Theorem l6.4l is different from the one taken in the previous proofs. 
As before, fix a linear index code for G{n, p) and denote its generator matrix by E G p«x^ Assume 
that every row of E consists of at most q nonzero entries for q G {2, 3}. Let z be a vertex and let A 
denote the set of rows in £ that correspond to non-neighbors of i. We are asking if there exists a 
vector V G span(E) which satisfies z (i.e., Vi ^ and Vj = for all j ^ i to which i is not connected). 
One can show that there exists such a vector if any only if the zth row of E cannot be written as a 
linear combination of a subset of the rows in A. The reason is that such a subset of rows enforces 
any vector in span(E) all of whose entries corresponding to rows in A are zero, to have zero in the 
zth entry as well, and in particular not to satisfy z. 

In our proof we show that every matrix E G F"^^ has many small sets F of rows which are 
minimally linearly dependent (where minimality is with respect to containment). As we will show 
later on, this can be achieved using the assumption that E has low density (at most q nonzero 
entries in a row). Notice that if the zth row of E belongs to such f , then the zth row of E can be 
written as a linear combination of |F| — 1 rows of E. If all the vertices that correspond to these 
row^s are non-neighbors of z then i has no satisfying vector in span(E). Therefore, the probability 
that z has a satisfying vector is at most 1 — (1 — p)l^'^^. 

Our construction of minimally linearly dependent row sets of E is based on a result of Naor 
and Verstraete II20II . They studied the maximum size of a set of vectors in F'^ with Hamming 
weight at most q in which every subset of size k is linearly independent over F. For F2, this is the 
minimum number of edges of size at most qina hypergraph on N vertices which does not contain 
an even coveiQ of size at most k. We now add the notion of a dependence set and use it to state 
the result of II20II . We note that we use their result only for q G {2, 3}, as for larger q our approach 
does not improve upon the 0.[^/n) bound given in Theorem l4.3[ 

Definition 6.5. A subset ofF^ is a fc-dependence set if it is a linearly dependent set over F whose size is 
at most k. 

Theorem 6.6 ( 11201 ). For every field F, zj G N and k > 8, there exists a constant c = c{\F\,q,k) > 

n ,T 2 1 JiiZiL 

for which the following holds for every N G N. Every subsel^ofW of at least c ■ N^ ^l^/sj x^ectors with 

Hamming weight at most q, contains a k-dependence set. 



■^An even cover is a non-empty set of edges such that every vertex belongs to an even number of them. 
^Throughout this section we allow multiplicities in the vector sets. 
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Equipped with Theorem l6.6l we are ready to state and prove a lemma on the existence of many 
small dependence sets in a given set of vectors. The basic idea in constructing these sets is to apply 
Theorem 16.61 iteratively However, it turns out (as will be explained later) that in order to avoid 
dependencies in our probability analysis we need every tw^o dependence sets of the construction 
to share at most one element. In what follows we add the notion of 1-intersecting family of sets 
and then state and prove our lemma. 

Definition 6.7. A family T of sets is 1-intersecting if every distinct A,B ^ T share at most one common 
element. 

Lemma 6.8. For every field F, (j G N, fl sufficiently small e > and a sufficiently large N E ^ the 
following holds. For every set A C F^ of at least N2+^'^ vectors with Hamming weight at most q, there 
exists a 1-intersecting family J^ C P{A) of k-dependence sets for some k < -j- that satisfies 

\T\ >n(Ni+^nogN 



k log k / 
Proof: Let A be a set of vectors in F^ with Hamming weight at most q and assume that \A\ > 

2lk/8\' 



j^2+2i! Lgt ]^ \)Q |.]-,g smallest integer so that e > 2U/J1 , and notice that k < -^ for any small enough 



e. 

We construct a family J-" C P{A) of fc-dependence sets as follows. Start w^ith J^ = (p and 
A' — A. As long as \A'\ > Nz+z*^, add to J-" a fc-dependence set F C A', whose existence is 
guaranteed by Theorem 16.61 using our choice of k, and continue with A' \ F. Notice that in this 
way we collect at least '-2^ /c-dependence sets. Now, partition A into k sets Ai, . . . ,Ak oi size '-j^ 
each, so that no F which was added to J-" in the previous step shares more than one element in 
common with some Aj. To achieve this, partition the elements of every such f into the k sets, 
at most one element in each Ai, and then partition the remaining elements in a w^ay that all the 
Ai's have (roughly) the same size. We continue recursively with the A/s and add the new k- 
dependence sets to the same J-". 

In the second iteration of the recursion we get at least -^ > ^ new ^-dependence sets from 
every Aj, so their total number is at least -f^^. In the zth iteration of the recursion there are k'^^ 
sets of size jr^ each. Each of them contributes at least U- ^-dependence sets to J-' and their total 
contribution is at least ^ . Notice that the recursion does not terminate as long as the sets are of 
size at least N^+i^, and hence the recursion depth is at least log^^ N2. In each iteration we add to 
J-" at least '-2^ fc-dependence sets, so the final T satisfies 

l-^l = n(^ ■ log.Ni) = n(N^+2qogN 



k log k 

Finally, observe that in every level of the recursion we get disjoint /c-dependence sets. Also, 
notice that the recursion is always applied to sets with intersection size at most 1 with every k- 
dependence set that was previously added to J-". This implies that T is 1-intersecting. ■ 

Now we turn to prove Theorem 16.41 

Proof of Theorem I6.4t Let q G {2,3}, fix a {q,£)-\o'w density index code for G{n, p) over F with 

2 
I = n~^ ^ , and denote its generator matrix by E G F"^^. The number of nonzero entries in a row 

of £ is at most q. Let A be the set of rows of E (possibly with multiplicities). This is a set of 

1 

vectors in F^ with Hamming w^eight at most q. Notice that |A| = n = £7 ' = £2+2^ for some 
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e' > |. By Lemma \&M there is a 1-intersecting family J-" C P{A) of ^-dependence sets for some 
k < Y < Y' — T' S'^ch that \J^\ > Q(nlog£ • j-^)- Assume, without loss of generality, that the 
sets in J-" are minimal (i.e., do not contain any proper linearly dependent subset). For simplicity, 
let us think of every set F G J-^ as a subset of [n] that consists of the indices of the rows in F. 

For every /c-dependence set F ^ T and i G F, the vertex i must be connected to a least one of 
the other vertices in F. Otherwise, a satisfying vector of i has zeros in all the entries with indices 
in F \ {f}. Such a vector must have a zero in the 2th entry as well, since the row that corresponds 
to vertex i can be written as a linear combination of the other rows in F. This yields that the vector 
does not satisfy /. Therefore, with probability (1 — py^'^^ > (1 — pY^^ the vertex i is not satisfied 
by any vector in span(E). 

Now, we apply this argument to every set F ^ T and an arbitrarily chosen vertex i G F and 
we bound from above the probability that / is satisfied by a vector in span(£). Observe that these 
events are independent, since if i is the vertex that was chosen from the sets Fj and F2 then the sets 
Fi \ {2} and F2 \ {/} are disjoint because J-" is 1-intersecting. So the probability that every vertex in 
the graph is satisfied by a vector in span(E) is at most (1 — (1 — p)*^^^ ) l-^l . Taking the union bound 

over at most ( ( ) • |F|'? ) generator matrices with at most q nonzero entries in a row^, we obtain 

that the probability that there exists a {q, i)-low density generator matrix index code for G{n,p) 
over F is at most 



— ^ ' log i 



A . |F|?y ■ (1 - (1 - pf-^)\^\ < 2'^n(loge+lom) . (1 - (1 - p)T) 

To complete the proof, notice that for any small enough p (depending only on |F| and e) the above 
tends exponentially to zero as n tends to infinity. ■ 

7 Concluding Remarks and Open Questions 

In this paper we initiated the study of index coding for the random graph G{n,p) over a field 
F and introduced two new models of index coding - locally decodable index coding and low^ 
density index coding. We proved several lower bounds on the length of linear index codes for 
G{n, p) (Theorems |4.3[ 1531 16.41 ) and showed that in order to improve the n(y^) lower bound it 
suffices to improve it for low density index codes (Theorem l6.3|) . 

The main task left for further work is to obtain tighter bounds on the minimum length of index 
codes for the random graph G{n, p) over a field F. More specifically, it is an open question if there 
exists an index code for G{n, p) (linear or not) shorter than the one achieved by the clique cover. 
It is interesting if our lower bounds can be extended to general (non-linear) index codes. It would 
be nice to understand better how the limit on the number of queries affects the length of locally 
decodable index codes for G{n, p). We hope that the new^ notion of low^ density index codes and 
Theorem [63] will be found useful in understanding the minrank of G(n, p) over F. 

Another challenging research direction is to study the vector capacity of the random graph 
G{n, p) (see KTSllSllTll). Here, the sender wishes to broadcast a word x of n blocks, each of t bits, to 
n receivers. The fth receiver is interested in the z'th block and has side information consisting of a 
subset of the other blocks according to G{n, p). Denoting by ^t the minimum number of bits that 
has to be transmitted, we are interested in limt^oo y. This limit represents the average communi- 
cation cost per bit in each block (for long blocks), and it will be very interesting to compare it to ^i 
of a typical random graph. 
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